import {Head} from 'mdx-deck';
import WideLayout, {WideLayoutWithSmallCode} from './components/WideLayout';
import MainTitle from './components/MainTitle';
import Title from './components/Title';
import styled from 'styled-components';
export theme from './theme';

<Head>
  <title>FOSDEM 2019 - Developing data structures for JavaScript</title>
  <link href="https://fonts.googleapis.com/css?family=Muli:900" rel="stylesheet" />
</Head>

<MainTitle />

<small>JavaScript devroom, FOSDEM 2019, Brussels</small>

---

## Why and how to implement efficient data structures to use with node.js or in the browser?

---

## Who am I?

*Guillaume Plique*

alias `Yomguithereal` on both [Github](https://github.com/Yomguithereal) and [Twitter](https://twitter.com/Yomguithereal).

Research engineer working for Sciences Po's [médialab](https://medialab.sciencespo.fr).

---

## What's a data structure?

---

> «Web development is not real development and is henceforth easier.»

<div style={{textAlign: 'right'}}>
  <small>
    Someone wrong on the Internet.
  </small>
</div>

---

> «Web development is trivial and web developers don't need fancy data structures or any solid knowledge in algorithmics.»

<div style={{textAlign: 'right'}}>
  <small>
    Someone also wrong (and pedant) on the Internet.
  </small>
</div>

---

### Don't we already have fully satisfying data structures in JavaScript?

* `Array` ➡ lists of things
* `Object` ➡ key-value associations
* `Map` and `Set` with ES6

---

<Title affix="1.">Why would we want other data structures in JavaScript?</Title>

---

<Title affix="1.1.">Convenience and bookkeeping</Title>

---

export default WideLayout

<Title affix="1.1.1." level={3}>A MultiSet</Title>

```js
// How about changing this:
const counts = {};

for (const item in something) {
  if (!(item in counts))
    counts[item] = 0;

  counts[item]++;
}
```

```js
// Into this:
const counts = new MultiSet();

for (const item in something)
  counts.add(item);
```

---

<Title affix="1.1.2." level={3}>Complex structures: a Graph</Title>

Sure, you can "implement" graphs using only `Array` and `Object`™.

But:

* Lots of bookkeeping (multi-way indexation)
* Wouldn't it be nice to have a legible interface?

---

Examples taken from the [graphology](https://graphology.github.io/) library:

```js
const graph = new Graph();

// Finding specific neighbors
const neighbors = graph.outNeighbors(node);

// Iterating over a node's edges
graph.forEachEdge(node, (edge, attributes) => {
  console.log(attributes.weight);
});
```

---

<Title affix="1.2.">Sometimes Arrays and Objects are not enough</Title>

---

<Title affix="1.2.1." level={3}>More than just tacky website candy</Title>

* We process data on the client nowadays.
* Node.js became a thing.
* Some algorithms cannot be efficiently implemented without custom data structures (Dijkstra or Inverted Index for full text search etc.).

---

<Title affix="1.2.2." level={3}>The QuadTree</Title>

![Quad](./img/quad.svg)

---

<Title affix="1.2.2." level={3}>The QuadTree</Title>

![QuadTree](./img/quadtree.svg)

<!-- ---

<Title affix="1.3.">Immutable data structures</Title>

* Not *yet* in JavaScript.
* Not covered by this talk.
* Check out [mori](https://github.com/swannodette/mori) and [immutable-js](https://facebook.github.io/immutable-js/). -->

---

<Title affix="2.">What are the challenges?</Title>

---

<Title affix="2.1." level={3}>Interpreted languages are far from the metal</Title>

---

<Title affix="2.2." level={3}>No control over memory layout</Title>

<Title affix="2.2." level={3}>No control over garbage collection</Title>

---

<Title affix="2.3." level={3}>JIT &amp; optimizing engines such as Gecko / V8</Title>

---

Benchmarking code accurately is **not** easy.

---

It does not mean we cannot be **clever** about it.

---

<Title affix="3.">Implementation tips</Title>

---

<Title affix="3.1." level={3}>Time &amp; memory performance</Title>

---

<Title affix="3.1.1." level={3}>Minimizing lookups</Title>

"Hashmap" lookups are costly.

```js
// You made 2 lookups
Graph.prototype.getNodeAttribute = function(node, data) {
  if (this._nodes.has(node))
    throw Error(...);

  const data = this._nodes.get(node);

  return data[name];
};
```

---

```js
// You made only one
Graph.prototype.getNodeAttribute = function(node, data) {
  const data = this._nodes.get(node);

  if (typeof data === 'undefined')
    throw Error(...);

  return data[name];
};
```

---

export default WideLayout

```
# Result, 100k items
--------------------
Two lookups: 31.275ms
One lookup:  15.762ms
```

The engine is clever. But not *that* clever. <small className="very-small">(It improves frequently, though...)</small>

The «*let's code badly, the engine will clean up my mess*» approach will not work.

---

<Title affix="3.1.2." level={3}>Creating objects is costly</Title>

* Avoid allocating objects.
* Avoid /(?:re\-)?creating/ regexes.
* Avoid nesting functions whenever possible.

---

```js
// BAD!
const test = x => /regex/.test(x);

// GOOD!
const REGEX = /regex/;
const test = x => REGEX.test(x);

// BAAAAAD!
function(array) {
  array.forEach(subarray => {

    // You just created one function per subarray!
    subarray.forEach(x => console.log(x));
  });
}
```

---

export default WideLayout

<Title affix="3.1.3." level={3}>Mixing types is bad</Title>

```js
// Why do you do that?
// If you are this kind of person, can we meet?
// I really want to understand.
const array = [1, 'two', '3', /four/, {five: new Date()}];
```
---

<Title affix="3.1.4." level={3}>The poor man's malloc</Title>

* Byte arrays are fan-ta-stic.
* Byte arrays are light.
* You can simulate typed memory allocation: `Uint8Array`, `Float32Array` etc.

---

<Title affix="3.1.4." level={3}>Implement your own pointer system!</Title>

And have your very own "C in JavaScript"™.

---

export default WideLayout

```txt
A linked list (with pointers):
------------------------------
head -> (a) -> (b) -> (c) -> ø
```

```js
// Using object references as pointers
function LinkedListNode(value) {
  this.next = null;
  this.value = value;
}
// Changing a pointer
node.next = otherNode;
```

---

export default WideLayout

```txt
A linked list (rolling our own pointers):
-----------------------------------------
head     = 0
values   = [a, b, c]
next     = [1, 2, 0]
```

```js
// Using byte arrays (capacity is fixed)
function LinkedList(capacity) {
  this.head = 0;
  this.next = new Uint16Array(capacity);
  this.values = new Array(capacity);
}
// Changing a pointer;
this.next[nodeIndex] = otherNodeIndex;
```

---

<Title level={3}>Let's build a most efficient LRU Cache!</Title>

* An object with maximum number of keys to save up some RAM.
* If we add a new key and we are full, we drop the **L**east **R**ecently **U**sed one.
* Useful to implement caches & memoization.

---

export default WideLayout

```txt
A ~doubly~ linked list:
-----------------------
head     = 0
tail     = 2
next     = [1, 2, 0]
prev     = [0, 1, 2]

Same as (with pointers):
------------------------
head -> (a) <-> (b) <-> (c) <- tail

A map to pointers & values:
---------------------------
items  = {a: 0, b: 1, c: 2}
values = [a, b, c]
```

---

| name                                                           | set   | get1  | update | get2  | evict |
|----------------------------------------------------------------|-------|-------|--------|-------|-------|
| [mnemonist-object](https://www.npmjs.com/package/mnemonist)    | 15314 | 69444 | 35026  | 68966 | 7949  |
| [tiny-lru](https://npmjs.com/package/tiny-lru)                 | 6530  | 46296 | 37244  | 42017 | 5961  |
| [lru-fast](https://npmjs.com/package/lru-fast)                 | 5979  | 36832 | 32626  | 40900 | 5929  |
| [mnemonist-map](https://www.npmjs.com/package/mnemonist)       | 6272  | 15785 | 10923  | 16077 | 3738  |
| [lru](https://www.npmjs.com/package/lru)                       | 3927  | 5454  | 5001   | 5366  | 2827  |
| [simple-lru-cache](https://npmjs.com/package/simple-lru-cache) | 3393  | 3855  | 3701   | 3899  | 2496  |
| [hyperlru-object](https://npmjs.com/package/hyperlru-object)   | 3515  | 3953  | 4044   | 4102  | 2495  |
| [js-lru](https://www.npmjs.com/package/js-lru)                 | 3813  | 10010 | 9246   | 10309 | 1843  |

<small>
  Bench <a href="https://github.com/dominictarr/bench-lru">here</a> - I masked libraries which are not LRU per se.
</small>

---

<Title affix="3.1.4." level={3}>Function calls are costly</Title>

Everything is costly. Life is harsh.

This means that rolling your own stack will always beat recursion.

---

export default WideLayout

```js
// Recursive version - "easy"
function recurse(node, key) {
  if (key < node.value) {
    if (node.left)
      return recurse(node.left, key);

    return false;
  }
  else if (key > node.value) {
    if (node.right)
      return recurse(node.right, key);

    return false;
  }

  return true;
}
```

---

export default WideLayoutWithSmallCode

```js
// Iterative version - more alien but faster, mileage may vary
function iterative(root, key) {
  const stack = [root];
  while (stack.length) {
    const node = stack.pop();
    if (key < node.value) {
      if (node.left)
        stack.push(node.left);
      else
        break;
    }
    else if (key > node.value) {
      if (node.right)
        stack.push(node.right);
      else
        break;
    }
    return true;
  }
  return false;
}
```

<!-- ---

<Title affix="3.1.5." level={3}>Algorithmics vs. the real world</Title>

A better big O is not necessary better in the real world.

Dijkstra requires a Fibonacci heap, but in JavaScript, a binomial heap will often win. -->

---

<Title affix="3.1.6." level={3}>What about wasm etc. ?</Title>

Lots of shiny options:

1. asm.js
2. WebAssembly
3. Native code binding in Node.js

---

Communication between those and JavaScript has a cost that negates the benefit.

This is only viable if you have long running code or don't need the bridge between the layer and JavaScript.

<!-- ---

<Title affix="3.2." level={3}>Respecting the ecosystem</Title> -->

<!-- ---

<Title affix="3.2.1." level={3}>Naming matters</Title>

Yes `#.unshift` is a terrible name. <small className="very-small">I am still grateful we avoided <a href="https://github.com/tc39/proposal-flatMap/pull/56">smooshing</a> arrays.</small>

But consistency is always a plus. -->

<!-- ---

<Title affix="3.2.1." level={3}>Implement well-known interfaces</Title>

```js
// Handling JSON serialization
Structure.prototype.toJSON = function() {
  return this.items;
};

JSON.stringify(structure);
>>> ['something', 'easily', 'serializable']

Structure.prototype.toString = function() {
  return somethingUsefulForOnce;
};
``` -->

<!-- ---

export default WideLayout

<Title affix="3.2.2." level={3}>Make structure iterables</Title>

```js
// Implement #.forEach "iterators"
queue.forEach(item => console.log(item));

// Implement the iterator protocol
Queue.prototype.values = function() {

  return {
    next: function() {
      // ...
      return {
        done: false,
        value: whatever
      };
    }
  };
};
``` -->

<!-- ---

export default WideLayout

<Title affix="3.2.2." level={3}>Make structure iterables</Title>

```js
// Make objects iterable
Queue.prototype[Symbol.iterator] = Queue.prototype.values;

// Behold the future!
for (const item of queue) {
  console.log(item);
}
``` -->

<!-- ---

<Title affix="3.2.3." level={3}>Custom Node.js inspection</Title>

```js
// Earlier (now Symbol.for('nodejs.util.inspect.custom'))
Queue.prototype.inspect = function() {
  return somethingUseful;
};

console.log(queue);
>>> Queue [1, 2, 3]

// Rather than:
>>> {
  size: 1,
  __items: [5, 1, 4],
  __someVeryVeryPrivateAndUglyName: true
}
``` -->

---

<Title affix="4.">Parting words</Title>

---

<Title affix="4.1." level={3}>Yes, optimizing JavaScript is hard.</Title>

---

<Title affix="4.2." level={3}>But it does not mean we cannot do it.</Title>

---

<Title affix="4.3." level={3}>Most tips are applicable to every high-level languages.</Title>

---

<Title affix="4.4." level={3}>But JavaScript has its very own kinks</Title>

The `ByteArray` tips absolutely don't work in python.

It's even slower if you use `numpy` arrays. (you need to go full native).

---

<Title level={3}>The gist</Title>

To be efficient your code must be **statically interpretable**.

If you do that:

1. The engine will have **no hard decisions** to make
2. And will safely choose the most aggressive optimization paths

---

<Title level={3}>Rephrased</Title>

Optimizing JavaScript = squinting a little and **pretending** really hard that:

1. The language has static typing.
2. That the language is low-level.

---

<Title affix="4.5." level={3}>Associative arrays are the next frontier</Title>

For now, there is no way to beat JavaScript's objects and maps when doing key-value association.

*Yet...*

---

<Title affix="5.">So implement away!</Title>

---

<Title affix="5.">References</Title>

Examples were taken from the following libraries:

* [mnemonist](https://yomguithereal.github.io/mnemonist): <small><em>yomguithereal.github.io/mnemonist</em></small>
* [graphology](https://graphology.github.io): <small><em>graphology.github.io</em></small>
* [sigma.js](http://sigmajs.org): <small><em>sigmajs.org</em></small>

---

# Thanks!
